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ABSTRACT 

In this work we present a new and efficient Bayesian method for nonlinear three dimensional 
large scale structure inference. We employ a Hamiltonian Monte Carlo (HMC) sampler to ob- 
tain samples from a multivariate highly non-Gaussian lognormal Poissonian density posterior 
given a set of observations. The HMC allows us to take into account the nonlinear relations 
between the observations and the underlying density field which we seek to recover. As the 
HMC provides a sampled representation of the density posterior any desired statistical sum- 
mary, such as the mean, mode or variance, can be calculated from the set of samples. Further, 
it permits us to seamlessly propagate non-Gaussian uncertainty information to any final quan- 
tity inferred from the set of samples. The developed method is extensively tested in a variety of 
test scenarios, taking into account a highly structured survey geometry and selection effects. 
Tests with a mock galaxy catalog based on the millennium run show that the method is able to 
recover the filamentary structure of the nonlinear density field. The results further demonstrate 
the feasibility of non-Gaussian sampling in high dimensional spaces, as required for precision 
nonlinear large scale structure inference. The HMC is a flexible and efficient method, which 
permits for simple extension and incorporation of additional observational constraints. Thus, 
the method presented here provides an efficient and flexible basis for future high precision 
large scale structure inference. 

Key words: large scale - reconstruction -Bayesian inference - cosmology - observations - 
methods - numerical 



1 INTRODUCTION 

Modern large galaxy surveys allow us to probe cosmic large scale 
structure to very high accuracy if the enormous amount of data can 
be processed and analyzed efficiently. Especially, precision recon- 
struction of the three dimensional density field from observations 
poses complex numerical challenges. For this reason, several re- 
construction methods and attempts to recover the underlying den- 
sity field from galaxy observations have been presented in litera- 
ture (see e.g.|Bertschinger & Dekel|1989[[T99TJ|Lahav et al.|1994| 
Hoffman|19941|Fisher et al.|1995t|Bistolas & Hoffman! 1 998 HWeb- 



ster et al. 1997 ; Schmoldt et al. 1999 ; Zaroubi 2002 ; Erdogdu et al. 
2004 [ |Kitaura et al.||2009[ |Jasche||20091 >. Recently, [Kitaura et al. 
(2009 ) presented a high resolution Wiener reconstruction of the 
Sloan Digital Sky Survey (SDSS) matter density field, and demon- 
strated the feasibility of precision large scale structure analysis. The 
Wiener filtering approach is based on a linear data model, which 
takes into account several observational effects, such as survey ge- 
ometry, selection effects and noise ( Kitaura & EnBlin 2008 ; Kitaura 
[eTaL][2009l |Jasche|2 009 ). Although, the Wiener filter has proven 
to be extremely efficient for three dimensional matter field recon- 
struction, it still relies on a Gaussian approximation of the density 
posterior. While this is an adequate approximation for the largest 
scales, precision recovery of nonlinear density structures may re- 



quire non-Gaussian posteriors. Especially, the detailed treatment 
of the non-Gaussian behavior and structure of the Poissonian shot 
noise contribution may allow for more precise recovery of poorly 
sampled objects. In addition, for a long time it has been suggested 
that the fully evolved nonlinear matter field can be well described 
by lognormal s tatistics (see e.g.|Hub ble 1934, Peebles 1980, Coles 
|& Jones|199T]|Gaztanaga & Yokoyama||1993[ |Kayo et al.|200lf 
These discussions seem to advocate the use of a lognormal Pois- 
sonian posterior for large scale structure inference. Several meth- 
ods have been proposed to take into account non-Gaussian density 
posteriors (see e.g. Saunders & Ballinger 2000, Kitaura & EnBlin 
|2008l[Enssiin et al.|2008[|Kitaura et al.|2009| >. 

However, if the recovered nonlinear density field is to be used 
for scientific purposes, the method not only has to provide a single 
estimate, such as a mean or maximum a postiori reconstruction, but 
it should also provide uncertainty information, and the means to 
nonlinearly propagate this uncertainty to any final quantity inferred 
from the recovered density field. 

For this reason, here we propose a new Bayesian method for 
nonlinear large scale structure inference. The developed computer 
program HADES (HAmiltonian Density Estimation and Sampling) 
explores the posterior distribution via an Hamiltonian Monte Carlo 
(HMC) sampling scheme. Unlike conventional Metropolis Hast- 
ings algorithms, which move through the parameter space by a 
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random walk, and therefore require prohibitive amounts of steps 
to explore high dimensional spaces, the HMC sampler suppresses 
random walk behavior by introducing a persistent motion of the 
Markov chain through the parameter space (Duane et al. 1987 ; |NeaT| 
1993 1996). In this fashion, the HMC sampler maintains a reason- 
able efficiency even for high dimensional problems (Hanson 2001). 
The HMC sampler has been widely used in Bayesian computation 
(see e.g. Neal 1993). In cosmology it has been employed for cosmo- 
logical parameter estimation and CMB data analysis (Hajian 2007 ; 
|Taylor et al.p008] >. 

In this work we demonstrate that the HMC can efficiently be 
used to sample the lognormal Poissonian posterior even in high di- 
mensional spaces. In this fashion, the method is able to take into 
account the nonlinear relationship between the observation and the 
underlying density which we seek to recover. The scientific output 
of the HMC is a sampled representation of the density posterior. For 
this reason, any desired statistical summary such as mean, mode 
and variance can easily be calculated from the HMC samples. Fur- 
ther, the full non-Gaussian uncertainty information can seamlessly 
be propagated to any finally estimated quantity by simply applying 
the according estimation procedure to all samples. This allows us 
to estimate the accuracy of conclusions drawn from the analyzed 
data. 

In this work, we begin, in section[2] by presenting a short jus- 
tification for the use of the lognormal distribution as a prior for 
nonlinear density inference, followed by a discussion of the log- 
normal Poissonian posterior in section [3] Section [4] outlines the 
HMC method. In section [5] the Hamiltonian equations of motion 
for the lognormal Poissonian posterior are presented. Details of the 
numerical implementation are described in section [6] The method 
is extensively tested in section |7]by applying HADES to generated 
mock observations, taking into account a highly structured survey 
geometry and selection effects. In section|8]we summarize and con- 
clude. 



2 THE LOGNORMAL DISTRIBUTION OF DENSITY 

In standard cosmological pictures, it is assumed that the initial seed 
perturbations in the primordial density field originated from an in- 
flationary phase in the early stages of the Big Bang. This inflation- 
ary phase enhances microscopic quantum fluctuations to macro- 
scopic scales yielding the initial density fluctuations required for 
gravitational collapse. These theories predict the initial density field 
amplitudes to be Gaussian distributed. However, it is obvious that 
Gaussianity of the density field can only be true in the limit \8\ «c 1, 
where 6 is the density contrast. In fully evolved density fields with 
amplitudes of cr 8 > 1, as observed in the sky at scales of galaxies, 
clusters and super clusters, a Gaussian density distribution would 
allow for negative densities, and therefore would violate weak and 
strong energy conditions. In particular, it would give rise to nega- 
tive mass (6 < -1). Therefore, in the course of gravitational struc- 
ture formation the density field must have changed its statistical 
properties. |Coles &~J ones (1991) argue that assuming Gaussian ini- 
tial conditions in the density and velocity distributions will lead to 
a log-normally distributed density field. It is a direct consequence 
of the continuity equation or the conservation of mass. 

Although, the exact probability distribution for the density 
field in nonlinear regimes is not known, the lognormal distribu- 
tion seems to be a good phenomenological guess with a long his- 
tory. Already Hubble noticed that galaxy counts in two dimensional 
cells on the sky can be well approximated by a lognormal distribu- 



tion (Hubble 1934). Subsequently, the lognormal distribution has 
been extensively discussed and agreements with galaxy observa- 



tions have been found (e.g. Hubble 1934 


Peebles 1980, Coles & 


Jones 
et al. 


1991 Gaztanaga & Yokoyama 1993 


Kayo etal.|2001|).|Kayo 


( 20011 studied the probability distribution of cosmological 



nonlinear density fluctuations from N-body simulations with Gaus- 
sian initial conditions. They found that the lognormal distribution 
accurately describes the nonlinear density field even up to values of 
the density contrast of 6 ~ 100. 

Therefore, according to observations and theoretical consid- 
erations, we believe, that the statistical behavior of the nonlinear 
density field can be well described by a multivariate lognormal dis- 
tribution, as given by: 



msk\\Q) ■ 
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,(1) 



where Q is the covariance matrix of the lognormal distribution and 
\ii describes a constant mean field given by: 

This probability distribution, seems to be an adequate prior choice 
for reconstructing the present density field. However, using such a 
prior requires highly nonlinear reconstruction methods, as will be 
presented in the following. 



3 LOGNORMAL POISSONIAN POSTERIOR 

Studying the actual matter distribution of the Universe requires 
to draw inference from some observable tracer particle. The most 
obvious tracer particles for the cosmic density field are galaxies, 
which tend to follow the gravitational potential of matter. As galax- 
ies are discrete particles, the galaxy distribution can be described as 
a specific realization drawn from an inhomogeneous Poisson pro- 
cess (see e.g. |Layzer|1956||Peebles|1980[partmez & Saar|2002| ). 
The according probability distribution is given as: 



P({N 8 k }\{A k }) = Y\ 



Nil 



(3) 



where N% is the observed galaxy number at position x k in the sky 
and A k is the expected number of galaxies at this position. The mean 
galaxy number is related to the signal s k via: 



A k =R k N(l+B(s) k ), 



(4) 



where R k is a linear response operator, incorporating survey geome- 
tries and selection effects, N is the mean number of galaxies in the 
volume and B(x) k is a nonlinear, non local, bias operator at posi- 
tion x k . The lognormal prior given in equation ([TJ together with the 
Poissonian likelihood given in equation (3J yields the lognormal 
Poissonian posterior, for the density contrast s k given some galaxy 



observations Nf 



P({s k }\{N g k }) = 



^2ndet{Q) 



n 



i 1 + Si 



U- ^ (5) 



Nil 



However, this posterior greatly simplifies if we perform the change 
of variables by introducing r k = ln(l + s k ). Note, that this change of 
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variables is also numerically advantageous, as it prevents numerical 
instabilities at values 6 1. Hence, we yield the Posterior 



P({r k }\{N 8 k }) = 



^2ndet(Q) 



(R k N{\+B{e r -X) k jf k e 



R t N(i+B(e r -l) t ) 



Nil 



(6) 



It is important to note, that this is a highly non-Gaussian distri- 
bution, and nonlinear reconstruction methods are required in order 
to perform accurate matter field reconstructions in the nonlinear 
regime. In example, estimating the maximum a postiori values from 
the lognormal Poissonian distribution involves the solution of im- 
plicit equations. However, we are not solely interested in a single 
estimate of the density distribution, we rather prefer to draw sam- 
ples from the lognormal Poissonian posterior. In the following, we 
are therefore describing a numerically efficient method to sample 
from this highly non-Gaussian distribution. 



4 HAMILTONIAN SAMPLING 

As already described in the previous section the lognormal Poisso- 
nian posterior will involve highly nonlinear reconstruction methods 
and will therefore be numerically demanding. Nevertheless, since 
we propose a Bayesian method, we are not interested in only pro- 
viding a single estimate of the density field, but would rather be 
able to sample from the full non-Gaussian posterior. Unlike, in the 
Gibbs sampling approach to density field sampling, as proposed in 
Jasche (2009), there unfortunately exists no known way to directly 
draw samples from the lognormal Poissonian distribution. For this 
reason, a Metropolis-Hastings sampling mechanism has to be em- 
ployed. 

However, the Metropolis-Hastings has the numerical disad- 
vantage that not every sample will be accepted. A low acceptance 
rate can therefore result in a prohibitive numerical scaling for the 
method, especially since we are interested in estimating full three 
dimensional matter fields which usually have about 10 6 or more 
free parameters s k . This high rejection rate is due to the fact, that 
conventional Markov Chain Monte Carlo (MCMC) methods move 
through the parameter space by a random walk and therefore re- 
quire a prohibitive amount of samples to explore high-dimensional 
spaces. Given this situation, we propose to use a Hybrid Monte 
Carlo method, which in the absence of numerical errors, would 
yield an acceptance rate of unity. 

The so called Hamiltonian Monte Carlo (HMC) method ex- 
ploits techniques developed to follow classical dynamical particle 
motion in potentials ( |Duane et al.| 1987| |Neal|1993[ fT996] >. In this 
fashion the Markov sampler follows a persistent motion through 
the parameter space, supressing the random walk behavior. This en- 
ables us to sample with reasonable efficiency in high dimensional 
spaces (Hanson 2001). 

The idea of the Hamiltonian sampling can be easily explained. 
Suppose, that we wish to draw samples from the probability distri- 
bution P({Xi}), where {xf} is a set consisting of the N elements x t . If 
we interpret the negative logarithm of this posterior distribution as 
a potential: 

i/,(x) = -ln(P(x)), (7) 
and by introducing a 'momentum' variable p t and a 'mass matrix' 



M, as nuisance parameters, we can formulate a Hamiltonian de- 
scribing the dynamics in the multi dimensional phase space. Such 
a Hamiltonian is then given as: 



n = J]J]\p i M;/p J + m, 



(8) 



As can be seen in equation {8}, the form of the Hamiltonian is such, 
that this distribution is separable into a Gaussian distribution in the 
momenta {pi} and the target distribution P{{xi\) as: 



(9) 



It is therefore obvious that, marginalizing over all momenta will 
yield again our original target distribution P({xi}). 

Our task now is to draw samples from the joint distribution, 
which is proportional to exp(-H). To find a new sample of the 
joint distribution we first draw a set of momenta from the distri- 
bution defined by the kinetic energy term, that is an N dimensional 
Gaussian with a covariance matrix M. We then allow our system 
to evolve deterministically, from our starting point ({xj, {/?;}) in the 
phase space for some fixed pseudo time r according to Hamilton's 
equations: 

at dpi 

dpi dH dif/(x) 
dt 



(ID 



dxi dxi 

The integration of this equations of motion yields the new position 
({*•}, {p^}) in phase space. This new point is accepted according to 
the usual acceptance rule: 



Pa = min [hexp(- (#({*{}, - #({*,-}, ipM 



(12) 



Since the equations of motion provide a solution to a Hamiltonian 
system, energy or the Hamiltonian given in equation {8} is con- 
served, and therefore the solution to this system provides an ac- 
ceptance rate of unity. In practice however, numerical errors can 
lead to a somewhat lower acceptance rate. Once a new sample has 
been accepted the momentum variable is discarded and the process 
restarts by randomly drawing a new set of momenta. The individ- 
ual momenta {/?;} will not be stored, and therefore discarding them 
amounts to marginalizing over this auxiliary quantity. Hence, the 
Hamiltonian sampling procedure basically consists of two steps. 
The first step is a Gibbs sampling step, which yields a new set of 
Gaussian distributed momenta. The second step, on the other hand 
amounts to solving a dynamical trajectory on the posterior surface. 



5 EQUATIONS OF MOTION FOR A LOG-NORMAL 
POISSONIAN SYSTEM 

In the framework of Hamiltonian sampling the task of sampling 
from the lognormal Poissonian posterior reduces to solving the cor- 
responding Hamiltonian system. Given the posterior distribution 
defined in equation {6} we can write the potential if/({r k }) as: 
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(13) 

The gradient of this potential with respect to r t then yields the 
forces, given as: 

Win)) 



dr, 



Nf 



(1+5(^-1)/) 



-RiN 



dB(e r - 1) 



d(e r - 1) 



(14) 



Equation |T4| obviously is a very general formulation of the recon- 
struction problem, and it demonstrates that the Hamiltonian sam- 
pler can in principle deal with all kinds of nonlinearities, especially 
in the case of the bias operator B(x). However, for the sake of this 
paper, but without loss of generality, in the following we will as- 
sume a linear bias model: 

B(x) k = bx k , (15) 
where b is a constant linear bias factor. We then obtain the potential: 

Hin}) = X -ln(2ndet{Q)) 



s 



In 



■Nflnd +b(e r - 1)) 



-R k N(l+b(e r -l))] 



(16) 



and the corresponding gradient reads: 

Win)) 



dri 



(17) 



Inserting these results in equations |T0| and {TT} then yields the 
equations of motion: 



J 

and 



(18) 



dp \ 
dt 



N 



(l+b(e r i - 1)) 



RiN\be ri . (19) 



New points on the lognormal Poissonian posterior surface can then 
easily be obtained by solving for the trajectory governed by the 
dynamical equations (T8] l and |T9] >. 



6 NUMERICAL IMPLEMENTATION 

Our numerical implementation of the lognormal Poissonian Sam- 
pler is named HADES (Hamiltonian Density Estimation and Sam- 
pling). It utilizes the FFTW3 library for Fast Fourier Transforms 
and the GNU scientific library (gsl) for random number genera- 
tion (Frigo & Johnson 2005 ; Galassi et al. 2003). In particular, we 
use the Mersenne Twister MT 19937, with 32-bit word length, as 
provided by the gsl_rng_mtl9937 routine, which was designed for 
Monte Carlo simulations (Matsumoto & Nishimura 1998). 



6.1 The leapfrog scheme 

As described above, a new sample can be obtained by calculating 
a point on the trajectory governed by equations {T8j and (T9| . This 
means that if we are able to integrate the Hamiltonian system ex- 
actly energy will be conserved along such a trajectory, yielding a 
high probability of acceptance. However, the method is more gen- 
eral due to the Metropolis acceptance criterion given in equation 
(T2| ). In fact, it is allowed to follow any trajectory to generate a 
new sample. This would enable us to use approximate Hamilto- 
nians, which may be evaluated computationally more efficiently. 
Note, however, that only trajectories that approximately conserve 
the Hamiltonian given in equation (8} will result in high acceptance 
rates. 

In order to achieve an optimal acceptance rate, we seek to 
solve the equations of motion exactly. For this reason, we employ a 
leapfrog scheme for the numerical integration. Since the leapfrog is 
a symplectic integrator, it is exactly reversible, a property required 
to ensure the chain satisfies detailed balance (Du ane et al.|1 987). 
It is also numerically robust, and allows for simple propagation of 
errors. Here, we will implement the Kick-Drift- Kick scheme. The 
equations of motions are integrated by making n steps with a finite 
stepsize e, such that r = ne\ 
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Figure 2. Volume rendering of artificial galaxy counts, generated as described in section pTTT] The two pannels show different projections. Effects of survey 
geometry and selection function are clearly visible. The observer is centered at (0, 0, 0). 
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We iterate these equations until t = r. Also note, that it is important 
to vary the pseudo time interval r, to avoid resonant trajectories. We 
do so by drawing n and e randomly from a uniform distribution. For 
the time being we will employ the simple leapfrog scheme. How- 
ever, it is possible to use higher order integration schemes, provided 
that exact reversibility is maintained. 

6.2 Hamiltonian mass 

The Hamiltonian sampler has a large number of adjustable parame- 
ters, namely the Hamiltonian 'mass matrix, M, which can greately 
influence the sampling efficiency. If the individual r k were Gaus- 
sian distributed, a good choice for HMC masses would be to set 
them inversely proportional to the variance of that specific r k (Tay- 
|lor et al.|2 008 ). However, for non-Gaussian distributions, such as 
the lognormal Poissonian posterior, it is reasonable to use some 
measure of the width of the distribution (Tay lor et al.|2 008 ). Neal 
( 1996) proposes to use the curvature at the peak. 

In our case, we expanded the Hamiltonian given in equation 
|T6| in a Taylor series up to quadratic order for \r t \ « 1. This 
Taylor expansion yields a Gaussian approximation of the lognormal 
Poissonian posterior. Given this approximation and according to 
the discussion in Appendix [A] the Hamiltonian mass should be set 
as: 



M IJ = Qtf-((Nf-R,N)b-Nfti l )6f J . 



(23) 



However, calculation of the leapfrog scheme requires inversions of 
M. Considering the high dimensionality of the problem, inverting 
and storing M~ l is computationally impractical. For this reason we 
construct a diagonal 'mass matrix' from equation {231. We found, 



that choosing the diagonal of M, as given in equation ( [23] ), in its 
Fourier basis yields faster convergence for the sampler than a real 
space representation, since it accounts for the correlation structure 
of the underlying density field. 



6.3 Parallelization 

For any three dimensional sampling method, such as the lognormal 
Poisson sampler or the Gibbs sampler presented in Jasche (2009), 
CPU time is the main limiting factor. For this reason parallelization 
of the code is a crucial issue. Since our method is a true Monte 
Carlo method, there exist in principle two different approaches to 
parallelize our code. 

The numerically most demanding step in the sampling chain 
is the leapfrog integration with the evaluation of the potential. 
One could therefore parallelize the leapfrog integration scheme, 
which requires parallelizing the fast Fourier transform. The FFTW3 
library provides parallelized fast Fourier transform procedures, 
and implementation of those is straightforward ( Frigo & Johnson 
2005). However, optimal speed up cannot be achieved. The other 
approach relies on the fact that our method is a true Monte Carlo 
process, and each CPU can therefore calculate its own Markov 
chain. In this fashion, we gain optimal speed up and the possibility 
to initialize each chain with different initial guesses. 

The major difference between these two parallelization ap- 
proaches is, that with the first method one tries to calculate a rather 
long sampling chain, while the latter one produces many shorter 
chains. 



7 TESTING HADES 

In this section, we apply HADES to simulated mock observations, 
where the underlying matter signal is perfectly known. With these 
tests we will be able to demonstrate that the code produces results 
consistent with the theoretical expectation. Further more, we wish 
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Figure 3. Tests of the initial burn-in behavior for the two test cases of the fiducial calculation (right panels) and the full test taking into account the observational 
uncertainties of survey geometry and selection effects (left panels). The upper panels show successive point to point statistics between the individual samples 
and the true underlying mock signal. It can be seen that the successive Hamiltonian samples show increasing correlation with the true underlying signal. The 
lower panels show the successive Euclidean distances between samples and the true underlying signal during burn-in. 



to gain insight into how the code performs in real world applica- 
tions, when CPU time is limited. 

7.1 Setting up Mock observations 

In this section we will describe a similar testing setup as described 
in Jasche (2009). For the purpose of this paper we generate lognor- 
mal random fields according to the probability distribution given in 
equation |TJ. These lognormal fields are generated based on cosmo- 
logical power- spectra for the density contrast 5. We generate these 
power spectra, with baryonic wiggles, following the prescription 
described in psenstein & Hu| ( fT998] > and psenstein & Hu| |l999) 
and assuming a standard ACDM cosmology with the set of cosmo- 
logical parameters (Q m = 0.24, Q A = 0.76, Q b = 0.04, h = 0.73, 
<t 8 = 0.74, n s = 1 ). Given these generated density fields we draw 
random Poissonian samples according to the Poissonian process 
described in equation 

The survey properties are described by the galaxy selection 
function F t and the observation Mask M t where the product: 

Ri^FiMt (24) 

yields the linear response operator. 

The selection function is given by: 

where r t is the comoving distance from the observer to the center 



of the ith voxel. For our simulation we chose parameters b = 0.6, 
r = 500 Mpc and y = 2. 

In figure^ we show the selection function together with the 
sky mask, which defines the observed regions in the sky. The two 
dimensional sky mask is given in sky coordinates of right ascension 
and declination. We designed the observational mask to represent 
some of the prominent features of the Sloan Digital Sky Survey 
(SDSS) mask (see Aba zajian et al.|2009| for a description of the 
SDSS data release 7). The projection of this mask into the three 
dimensional volume yields the three dimensional mask M\. 

Two different projections of this generated mock galaxy sur- 
vey are presented in figure|2]to give a visual impression of the arti- 
ficial galaxy observation. 

7.2 Burn in behavior 

The theory described above demonstrates that the Hamiltonian 
sampler will provide samples from the correct probability distri- 
bution function as long as the initial conditions for the leapfrog in- 
tegration are part of the posterior surface. However, in practice the 
sampler is not initialized with a point on the posterior surface, and 
therefore an initial burn-in phase is required until a point on the cor- 
rect posterior surface is identified. As there exists no theoretical cri- 
terion, which tells us when the initial burn-in period is completed, 
we have to test this initial sampling phase through experiments. 
These experiments are of practical relevance for realworld applica- 
tions, as they allow us to estimate how many samples are required 
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Figure 4. Successive power spectra measured from the Hamiltonian samples during burn-in. The right panels correspond to the fiducial calculation, while 
the left panels display the burn-in behavior of the complete observational problem. The upper panels show the convergence of the individual sample spectra 
towards the spectrum of the true underlying matter field realization (black curve). The lower panels display the deviation from the true underlying spectrum 
demonstrating good convergence at the end of the burn-in period. 



before the sampler starts sampling from the correct posterior distri- 
bution. To gain intuition we set up a simple experiment, in which 
we set the initial guess for the lognormal field constant to unity 
(r? = 1). Therefore, the initial samples in the chain will be required 
to recover structures contained in the observation. In order to gain 
intuition for the behavior of our nonlinear Hamiltonian sampler, we 
compare two cases. The first case consists of an artificial observa- 
tion including selection effects and observational mask generated 
as described above. The second case is a comparison calculation, 
where we set the observation response operator R t = 1. In this latter 
fiducial case, only shot noise remains as observational uncertainty. 
It is important to note, that the individual Markov samples are un- 
biased in the sense that they possess the correct power information. 
Unlike a filter, which suppresses power in the low signal to noise 
regions, the Hamiltonian sampler draws true samples from the log- 
normal Poissonian posterior, given in equation {5}, once burn-in is 
completed. Therefore, a lognormal Poissonian sample has to be un- 
derstood as consisting of a true signal part, which can be extracted 
from the observation and a fluctuating component, which restores 
power lost due to the observation. This interpretation is similar to 
the behavior of the Gibbs sampler, as discussed in Jasche (2009), 
with the exception that there is no obvious way to separate the true 
signal part from the variance contribution for the nonlinear Hamil- 
tonian sampler. Hence, the lower the signal to noise ratio of the 
data, the higher will be the fluctuating component. 



three successive Markov samples to the true mock signal via a point 
to point statistics. It can be nicely seen, that the correlation with the 
true underlying mock signal improves as burn-in progresses. As 
expected, the fiducial calculation, shown in the right panels of [3] 
has a much better correlation with the underlying true mock signal 
than the full observation. This is clearly owing to the fact, that the 
full observation introduces much more variance than in the fidu- 
cial case. To visualize this fact further, we calculate the Euclidean 
distance between Markov samples and the true mock signal: 



(26) 



This effect can be observed in figure [3] where we compare field realization P f [ ue via: 



over the progression of the Markov chain. In the lower panels of 
figure [3] it can be observed that the Eucledian distance drops ini- 
tially and then saturates at a constant minimal d k . This minimal 
d k is related to the intrinsic variance contribution in the individual 
samples. While the variance is lower for the fiducial observation, it 
is higher for the full observation. 

As HADES produces unbiased samples, we can gain more de- 
tailed insight into the initial burn-in phase of the Markov chain, by 
following the evolution of successive power-spectra measured from 
the samples. In addition, we measure the deviation f * of the sample 
power spectra P k x to the power spectrum of the true mock matter 
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Figure 5. The upper panels show the point to point statistic of the ensemble mean field to the true underlying density field in the observed region for the 
fiducial calculation (right panel) and the full observational problem (left panel). The numbers in the upper left part of the plots correspond to the Euclidean 
distance d and the correlation factor c. In the lower panels we plotted the results of the Gelman&Rubin convergence diagnostic for the according tests. The 
PSRF indicate good convergence. 



pk _ ptrue 

£=-4s=r-- ( 27 > 

Figure [4] demonstrates that HADES completes burn-in after ~ 20 
samples in the case of the fiducial calculation (right panels). How- 
ever, the burn-in history for the full observation (left panels) reveals 
a more interesting behavior. 

Initially, the power spectra show huge excursions at large 
scales. This is due to the observational mask and the fact, that 
initially these regions are dominated by the constant initial guess 
(r° = 1). It is interesting to note, that the first sample seems to be 
much closer to the true underlying power specrum at the smaller 
scales, while the 20th samples is much further away. This clearly 
demonstrates the nonlinear behavior of the lognormal Poissonian 
sampler. We observe, that with iterative correction of the large scale 
power, the entire power spectrum progressively approaches the true 
mock power spectrum. This can be seen nicely in the lower left 
panel of figure [4] After one hundred samples have been calculated 
the true mock power spectrum is recovered for all following sam- 
ples. Thus, the initial burn-in period for a realistic observational 
setting can be expected to be on the order of 100 samples. Such a 
burn-in period is numerically not very demanding, and can easily 
be achieved in even higher resolution calculations. 

Further, we ran a full Markov analysis for both test cases, by 
calculating 20000 samples with a resolution of 64 3 voxels. We then 
estimate the ensemble mean and compared the recovered density 



field in the observed region via a point to point statistic to the true 
underlying mock signal. The results are presented in the upper pan- 
els of figure [5] It can be seen that both results are strongly corre- 
lated with the true underlying signal. To emphasize this fact, we 
also calculate the correlation factor given as: 



The correlation factors for the two test scenarios are also given in 
figure [5] They clearly demonstrate, that the Hamiltonian sampler 
was able to recover the underlying density field to high accuracy in 
both cases. 

7.3 Convergence 

Testing the convergence of Markov chains is subject of many dis- 
cussions in literature (se e e.g.[H eidelberger & Welch 1981 ; Gelman 
|& Rubin|1992l|Geweke||1992HRaftery & Lewis ||1 995 [|Cowles"& 
|Carlin|1996[|Hanson|20011|Dunkley et al.|2005| >. In principle, there 
exist two categories of possible diagnostics. The methods of the 
first category rely on comparing inter chain quantities between sev- 
eral chains while others try to estimate the convergence behavior 
from inter chain quantities within a single chain. In this paper we 
use the widely used Gelman&Rubin diagnostic, which is based on 
multiple simulated chains by comparing the variances within each 
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chain and the variance between chains (Gelman & Rubin 1992 ). In 
particular, we calculate the potential scale reduction factor (PSRF) 
(see Appendix [B] for details). A large PSRF indicates that the inter 
chain variance is substantially greater than the intra chain variance 
and longer chains are required. Once the PSRF approaches unity, 
one can conclude that each chain has reached the target distribution. 

We calculated the PSRF for each voxel of our test cases for 
chains with length N samp = 20000. The results for the two tests, as 
discussed above, are presented in figure[5] They clearly indicate the 
convergence of the Markov chains. 

For the time being we use the Gelman&Rubin statistic to test 
convergence because of technical simplicity, although for the ex- 
pense of having to calculate at least two chains. In the future we 
plan to explore other convergence diagnostics. In particular we are 
aiming at including intra chain methods as proposed in Hanson 
(2001) or Dunkley et al. (2005). This would allow us to detect con- 
vergence behavior within the chain during burn-in. Such a conver- 
gence criterion could then be used to adjust the Hamiltonian masses 
for optimal sampling efficiency, as was proposed in |Taylor et al.| 
( [2008} . 

7.4 Testing with simulated galaxy surveys 

In this section, we describe the application of HADES to a mock 
galaxy survey based on the millennium run ( [Croton et al. 2006). 
The intention of this exercise is to test HADES in a realistic ob- 
servational scenario. In particular, we want to demonstrate that 
HADES is able to reconstruct the fully evolved nonlinear density 
field of the N-body simulation. The mock galaxy survey consists of 
a set of comoving galaxy positions distributed in a 500 Mpc box. 
To introduce survey geometry and selection effects, we virtually 
observe these galaxies through the sky mask and according to the 
selection function described in section |7.1| The resulting galaxy 
distribution is then sampled to a 128 3 grid. This mock observation 
is then processed by HADES, which generates 20000 lognormal 
Poissonian samples. 

In figure [6] we present successive slices through density sam- 
ples of the initial burn-in period. As can be seen, the first Hamilto- 
nian sample (upper panels in figure [6} is largely corrupted by the 
false density information in the masked regions. This is due to the 
fact, that the Hamiltonian sampler cannot be initialized with a point 
on the posterior surface. The initial samples are therefore required 
to identify a point on the according posterior surface. As can be 
seen, the power in the unobserved and observed regions equalizes 
in the following samples. Also note, that the first density sample 
depicts only very coarse structures. However, subsequent samples 
resolve finer and finer details. With the hundredth sample burn-in 
is completed. The lower panels of figure [6] demonstrate, that the 
Hamiltonian sampler nicely recovers the filamentary structure of 
the density field. 

Being a fully Bayesian method, the Hamiltonian sampler does 
not aim at calculating only a single estimate, such as a mean or 
maximum a postiori value, it rather produces samples from the full 
lognormal Poissonian posterior. Given these samples we are able to 
calculate any desired statistical summary. In particular, we are able 
to calculate the mean and the according variance of the Hamiltonian 
samples. 

In figure [7] we show three different volume renderings of the 
ensemble mean density and the according ensemble variance fields. 
It can be seen that the variance projections nicely reflect the Pois- 
sonian noise structure. Comparing high density regions in the en- 
semble mean projections to the corresponding positions in the vari- 



ance projections, reveals a higher variance contribution for these 
regions, as expected for Poissonian noise. This demonstrates, that 
our method allows us to provide uncertainty information for any 
resulting final estimate. 



8 SUMMARY AND CONCLUSION 

In this work we introduced the Hamiltonian Monte Carlo sampler 
for nonlinear large scale structure inference and demonstrated its 
performance in a variety of tests. As already described above, ac- 
cording to observational evidence and theoretical considerations, 
the posterior for nonlinear density field inference is adequately rep- 
resented by a lognormal Poissonian distribution, up to overdensities 
of 6 ~ 100. Hence, any method aiming at precision estimation of 
the fully evolved large scale structure in the Universe needs to han- 
dle the nonlinear relation between observations and the signal we 
seek to recover. The Hamiltonian Monte Carlo sampler, presented 
in this work, is a fully Bayesian method, and as such tries to eval- 
uate the lognormal Poissonian posterior, given in equation [5] via 
sampling. In this fashion, the scientific output of the method is not 
a single estimate, but a sampled representation of the multidimen- 
sional posterior distribution. Given this representation of the pos- 
terior any desired statistical summary, such as mean, mode or vari- 
ances can easily be calculated. Further, any uncertainty can seam- 
lessly be propagated to the finally estimated quantities, by sim- 
ply applying the according estimation procedure to all Hamiltonian 
samples. 

Unlike conventional Metropolis Hastings algorithms, which 
move through the parameter space by random walk, the Hamilto- 
nian Monte Carlo sampler suppresses random walk behavior by 
following a persistent motion. The HMC exploits techniques de- 
veloped to follow classical dynamical particle motion in poten- 
tials, which, in the absence of numerical errors, yield an acceptance 
probability of unity. Although, in this work we focused on the use 
of the lognormal Poissonian posterior, the method is more general. 
The discussion of the Hamiltonian sampler in section [4] demon- 
strates that the method can in principle take into account a broad 
class of posterior distributions. 

In section [7] we demonstrated applications of the method to 
mock test cases, taking into account observational uncertainties 
such as selection effects, survey geometries and noise. These tests 
were designed to study the performance of the method in real world 
applications. 

In particular, it was of interest to establish intuition for the be- 
havior of the Hamiltonian sampler during the initial burn-in phase. 
Especially, the required amount of samples before the sampler 
starts drawing samples from the correct posterior distribution was 
of practical relevance. The tests demonstrated, that for a realistic 
setup, the initial burn-in period is on the order of ~ 100 samples. 

Further, the tests demonstrated that the Hamiltonian sampler 
produces unbiased samples, in the sense that each sample possesses 
correct power. Unlike a filter, which suppresses the signal in low 
signal to noise regions, the Hamiltonian sampler nonlinearly aug- 
ments the poorly or not observed regions with correct statistical in- 
formation. In this fashion, each sample represents a complete mat- 
ter field realization consistent with the observations. 

The convergence of the Markov chain was tested via a Gel- 
man&Rubin diagnostic. We compared the intra and inter chain vari- 
ances of two Markov chains each of length 20000 samples. The 
estimated PSRF indicated good convergence of the chain. This re- 
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Figure 6. Slices through density samples during the initial burn-in phase. The upper panels correspond to the first sample, middle panels show the tenth sample 
and the lower panels present the hundredth sample. Left and right panels show two different slices through the corresponding sample. It can be seen that during 
the initial burn-in phase power equalizes between the observed and unobserved regions. Successive samples recover finer and finer details. 
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Figure 7. Volume rendering of the ensemble variance (upper panels) and the ensemble mean (lower panels) obtained from the mock galaxy catalog analysis 
for three different perspectives. The mean shows filigree structures which have been recovered. It can also be seen that the ensemble variance reflects the 
Poissonian behavior of the noise. High density regions in the ensemble mean field correspond to regions with high variance as is expected for a Poissonian 
shot noise contribution. 



suit demonstrates, that it is possible to efficiently sample from non- 
Gaussian distributions in very high dimensional spaces. 

In a final test the method was applied to a realistic galaxy 
mock observation based on the millennium run ( Croton et al. 2006 ). 
Here we introduced again survey geometry and selection effects 
and generated 20000 samples of the lognormal Poissonian poste- 
rior. The results nicely demonstrate that the Hamiltonian sampler 
recovers the filamentary structure of the underlying matter field re- 
alization. For this test we also calculated the ensemble mean and 
the corresponding ensemble variance of the Hamiltonian samples, 
demonstrating that the Hamiltonian sampler also provides error in- 
formation for a final estimate. 

To conclude, in this paper we present a new and numerically 
efficient Bayesian method for large scale structure inference, and its 
numerical implementation HADES. HADES provides a sampled 
representation of the very high dimensional non-Gaussian large 
scale structure posterior, conditional on galaxy observations. This 
permits us to easily calculate any desired statistical summary, such 
as mean, mode and variance. In this fashion HADES is able to pro- 
vide uncertainty information to any final quantity estimated from 
the Hamiltonian samples. The method, as presented here, is very 
flexible and can easily be extended to take into account additional 
nonlinear observational constraints and joint uncertainties. 

In summary, HADES, in its present form, provides the basis 
for future nonlinear high precision large scale structure analysis. 



ACKNOWLEDGMENTS 

We would like to thank Rainer Moll and Bjorn Malte Schafer for 
usefull discussions and support with many valuable numerical gad- 
gets. Further, we thank R. Benton Metcalf, for useful remarks and 
commentaries, and Torsten A. EnBlin for encouraging us to persue 
this project. Special thanks also to Maria Angeles Bazarra-Castro 
for helpful assistance during the course of this project. We also 
thank the "Transregional Collaborative Research Centre TRR 33 
- The Dark Universe" for the support of this work. 



© 2006 RAS, MNRAS 000,[T}{T4] 



12 Jens Jasche 1 , Francisco Shu Kitaura 



REFERENCES 

Abazajian K. N., et al., 2009, ApJS, 182, 543 

Bertschinger E., Dekel A., 1989, ApJL, 336, L5 

Bertschinger E., Dekel A., 1991, in Latham D. W., da Costa 
L. A. N., eds, ASP Conf. Ser. 15: Large-Scale Structures and 
Peculiar Motions in the Universe Mapping Large-Scale Flows in 
Three Dimensions: Method, pp 67 — h 

Bistolas V., Hoffman Y, 1998, ApJ, 492, 439 

Coles P, Jones B., 1991, MNRAS, 248, 1 

Cowles M. K., Carlin B. P, 1996, Journal of the American Statis- 
tical Association, 91, 883 

Croton D. J., Springel V., White S. D. M., De Lucia G., Frenk 
C. S., Gao L., Jenkins A., Kauffmann G., Navarro J. E, Yoshida 
N.,2006, MNRAS, 365, 11 

Duane S., Kennedy A. D., Pendleton B. J., Roweth D., 1987, 
Physics Letters B, 195, 216 

Dunkley J., Bucher M., Ferreira P G., Moodley K., Skordis C, 
2005, MNRAS, 356, 925 

Eisenstein D. J., Hu W, 1998, ApJ, 496, 605 

Eisenstein D. J., Hu W, 1999, ApJ, 511, 5 

Ensslin T. A., Frommert M., Kitaura F. S., 2008, ArXiv e-prints 

Erdogdu P, Lahav O., Zaroubi S., et al. 2004, MNRAS, 352, 939 

Fisher K. B., Lahav O., Hoffman Y, Lynden-Bell D., Zaroubi S., 
1995, MNRAS, 272, 885 

Frigo M., Johnson S. G., 2005, Proceedings of the IEEE, 93, 216 

Galassi M., Davies J., Theiler J., Gough B., Jungman G., Booth 
M., Rossi F, 2003, Gnu Scientific Library: Reference Manual. 
Network Theory Ltd. 

Gaztanaga E., Yokoyama J., 1993, ApJ, 403, 450 

Gelman A., Rubin D., 1992, Statistical Science, 7, 457 

Geweke J., , 1992, Evaluating the Accuracy of Sampling-Based 
Approaches to the Calculation of Posterior Moments 

Hajian A., 2007, Phys. Rev. D, 75, 083525 

Hanson K. M., 2001, in Sonka M., Hanson K. M., eds, Society of 
Photo-Optical Instrumentation Engineers (SPIE) Conference Se- 
ries Vol. 4322 of Society of Photo-Optical Instrumentation En- 
gineers (SPIE) Conference Series, Markov chain Monte Carlo 
posterior sampling with the Hamiltonian method, pp 456-467 

Heidelberger P., Welch P. D., 1981, Commun. ACM, 24, 233 

Hoffman Y, 1994, in Balkowski C, Kraan-Korteweg R. C, 
eds, Unveiling Large-Scale Structures Behind the Milky Way 
Vol. 67 of Astronomical Society of the Pacific Conference Se- 
ries, Wiener Reconstruction of the Large-Scale Structure in the 
Zone of Avoidance, pp 185-+ 

Hubble E., 1934, ApJ, 79,8 

Jasche J., , 2009, Bayesian power- spectrum inference for Large 

Scale Structure data 
Kayo I., Taruya A., Suto Y, 2001, ApJ, 561, 22 
Kitaura F. S., EnBlin T. A., 2008, MNRAS, 389, 497 
Kitaura F. S., Jasche J., Li C, Ensslin T. A., Metcalf R. B., Wan- 

delt B. D., Lemson G., White S. D. M., 2009, ArXiv e-prints 
Kitaura F. S., Jasche J., Metcalf R. B., 2009, ArXiv e-prints 
Lahav O., Fisher K. B., Hoffman Y, Scharf C. A., Zaroubi S., 

1994, ApJL, 423, L93+ 
LayzerD., 1956, AJ, 61, 383 

Martmez V. J., Saar E., 2002, Statistics of the Galaxy Distribution. 

Chapman &amp 
Matsumoto M., Nishimura T., 1998, ACM Trans. Model. Comput. 

Simul., 8, 3 

Neal R. M., 1993, Technical Report CRG-TR-93-1, Probabilistic 
inference using Markov chain Monte Carlo methods. University 



of Toronto 

Neal R. M., 1996, Bayesian Learning for Neural Networks (Lec- 
ture Notes in Statistics), 1 edn. Springer 

Peebles P. J. E., 1980, The large-scale structure of the universe 

Raftery A. E., Lewis S. M., 1995, in In Practical Markov Chain 
Monte Carlo (WR. Gilks, D.J. Spiegelhalter and The number of 
iterations, convergence diagnostics and generic metropolis algo- 
rithms. Chapman and Hall, pp 115-130 

Saunders W, Ballinger W. E., 2000, in Kraan-Korteweg R. C, 
Henning P. A., Andernach H., eds, Mapping the Hidden Uni- 
verse: The Universe behind the Mily Way - The Universe in 
HI Vol. 218 of Astronomical Society of the Pacific Conference 
Series, Interpolation of Discretely-Sampled Density Fields, pp 
181 — h 

Schmoldt I. M., Saar V, Saha P., Branchini E., Efstathiou G. P., 
Frenk C. S., Keeble O., Maddox S., McMahon R., Oliver S., 
Rowan-Robinson M., Saunders W, Sutherland W. J., Tadros H., 
White S. D. M., 1999, ApJ, 118, 1146 

Taylor J. F, Ashdown M. A. J., Hobson M. P., 2008, MNRAS, 
389, 1284 

Webster M., Lahav O., Fisher K., 1997, MNRAS, 287, 425 
Zaroubi S., 2002, MNRAS, 331, 901 



© 2006 RAS, MNRAS 000,[T}{T4] 



Fast Hamiltonian Sampling for large scale structure inference 13 



APPENDIX A: HAMILTONIAN MASSES 

The Hamiltonian sampler can be extremely sensitive to the choice 
of masses. To estimate a good guess of Hamiltonian masses we 
follow a similar approach as suggested in |Taylor et al.|(2 008). Ac- 
cording to the leapfrog scheme, given in equations ( |20| , (21} and 
( |22| , a single application of the leapfrog method can be written in 
the form: 



Pi(t + e) = Piit) - 



r(t) & r i 



n(t + e) = n(t) + 6 ^ m;/ Pj (t) -jj] 



it+e) 



M7, 



d¥(r) 



drj 



(Al) 



(A2) 



r(t) 



We will then approximate the forces given in equation |T7] > for 

r, << 1: 



Nf 
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By introducing: 
and 

D, = Y J (S^ J -(Nf-R,N)b, 

j 

equation ( |A3| ) simplifies to: 
dif/(r) , 



R,N\be n 



(A3) 
(A4) 
(A5) 

(A6) 



Introducing this approximation into equations \A\) and \A2) 
yields: 
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(A8) 



This result can be rewritten in matrix notation as: 

i 
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where the matrix r is given as: 
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(A9) 



(A10) 



with I being the identity matrix. Successive applications of the 
leapfrog step yield the following propagation equation: 



P n 



Z r ' 



M~ l D 
e - yAM -1 j D 



(All) 



This equation demonstrates, that there are two criteria to be ful- 
filled if the method is to be stable under repeated application of the 
leapfrog step. First we have to ensure, that the first term of equation 
( [Al 1 1 » does not diverge. This can be fulfilled if the eigenvalues of 
T have unit modulus. The eigenvalues A are found by solving the 
characteristic equation: 



det 



I A - 2 A 1 1 



-AM + 1 



: 0. 



(A12) 



Note, that this is a similar result to what was found in Taylor ^FaT] 
(2008J. Our aim is to explore the parameter space rapidly, and 
therefore we wish to choose the largest e still compatible with the 
stability criterion. However, any dependence of equation \A\2\ also 
implies, that no single value of e will meet the requirement for ev- 
ery eigenvalue to have unit modulus. For this reason we choose: 



A - M . 

We then yield the characteristic equation: 



(A13) 
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■ 0, 



(A14) 



where N is the number of voxels. This yields the eigenvalues: 



A = +iil 





2 






+ 











(A15) 



which have unit modulus for 6^2. The second term in equation 
\A\1\ involves evaluation of the geometric series Z"=o T l . How- 
ever, the geometric series for a matrix converges if and only if 
| A; | < 1 for each A t eigenvalue of T. This clarifies, that the non- 
linearities in the Hamiltonian equations generally do not allow for 
arbitrary large pseudo time steps e. In addition, for practical pur- 
poses we usually restrict the mass matrix to the diagonal of equa- 
tion ( |A4| ). For these two reasons, in practize, we choose the pseudo 
timestep e as large as possible while still obtaining a reasonable 
rejection rate. 



APPENDIX B: GELMAN&RUBIN DIAGNOSTIC 

The Gelman&Rubin diagnostic is a multichain convergence test 
(Gelman & Rubin 1992). It is based on analyzing multiple Markov 
chains by comparing intra chain variances, within each chain, and 
inter chain variances. A large deviation between these two vari- 
ances indicates nonconvergence of the Markov chain. Let {0^}, 
where k = l,...,N, be the collection of a single Markov chain 
output. The parameter (p k is the Mi sample of the Markov chain. 
Here, for notational simplicity, we will assume to be single di- 
mensional. To test convergence with the Gelman&Rubin statistic, 
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one has to calculate M parallel MCMC chains, which are initial- 
ized from different parts of the target distribution. After discard- 
ing the initial burn-in samples, each chain is of length n. We can 
then label the outputs of various chains as (p k m , with k - 1, n and 
m = 1, M. The inter chain variance B can then be calculated as: 

M 
m=l 

where 6 m is given as: 

Om = -Y.<f m , (B2) 

and as: 
1 M 

m=l 

Then the intra chain variance can be calculated as: 
1 M 

w= mTA> tB4) 

m=l 

with : 



») 2 • (B5) 



With the above definition the marginal posterior variance can be 
estimated via: 

V= 1Z± W+ ^± B . (B6 ) 
n nM 

If all M chains have reached the target distribution, this posterior 
variance estimate should be very close to the intra chain variance 
W. For this reason, one expects the ratio V/W to be close to 1. The 
square root of this ratio is reffered to as the potential scale reduction 
factor (PSRF): 

PSRF=^. (B7) 

If the PSRF is close to one, one can conclude that each chain has 
stabilized, and has reached the target distribution (Gelm an & Rubin| 
[T992] >. 
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